Improving Quality of Search Results Clustering with Approximate Matrix Factorisations
نویسنده
چکیده
In this paper we show how approximate matrix factorisations can be used to organise document summaries returned by a search engine into meaningful thematic categories. We compare four different factorisations (SVD, NMF, LNMF and K-Means/Concept Decomposition) with respect to topic separation capability, outlier detection and label quality. We also compare our approach with two other clustering algorithms: Suffix Tree Clustering (STC) and Tolerance Rough Set Clustering (TRC). For our experiments we use the standard merge-thencluster approach based on the Open Directory Project web catalogue as a source of human-clustered document summaries.
منابع مشابه
A Free Line Search Steepest Descent Method for Solving Unconstrained Optimization Problems
In this paper, we solve unconstrained optimization problem using a free line search steepest descent method. First, we propose a double parameter scaled quasi Newton formula for calculating an approximation of the Hessian matrix. The approximation obtained from this formula is a positive definite matrix that is satisfied in the standard secant relation. We also show that the largest eigen value...
متن کاملThe ensemble clustering with maximize diversity using evolutionary optimization algorithms
Data clustering is one of the main steps in data mining, which is responsible for exploring hidden patterns in non-tagged data. Due to the complexity of the problem and the weakness of the basic clustering methods, most studies today are guided by clustering ensemble methods. Diversity in primary results is one of the most important factors that can affect the quality of the final results. Also...
متن کاملWater Quality Zoning of Rivers by the Technique of Fuzzy Clustering Analysis
Zoning the pollution of a river may be the first or even the most important step in water quality management. In order to resolve its pollution, fuzzy clustering analysis may be used whenever a composite classification of water quality incorporates mutiple parameters
 
In such cases, the technique may be used as a complement or an alternative to comprehensive assessment. In fuzzy cluster...
متن کاملWater Quality Zoning of Rivers by the Technique of Fuzzy Clustering Analysis
Zoning the pollution of a river may be the first or even the most important step in water quality management. In order to resolve its pollution, fuzzy clustering analysis may be used whenever a composite classification of water quality incorporates mutiple parameters In such cases, the technique may be used as a complement or an alternative to comprehensive assessment. In fuzzy clustering ...
متن کاملCarrot2 and Language Properties in Web Search Results Clustering
This paper relates to a technique of improving results visualization in Web search engines known as search results clustering. We introduce an open extensible research system for examination and development of search results clustering algorithms – Carrot. We also discuss attempts to measuring quality of discovered clusters and demonstrate results of our experiments with quality assessment when...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2006